Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

نویسندگان

Dianqi Li

Xiaodong He

Qiuyuan Huang

Ming-Ting Sun

Lei Zhang

چکیده

We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machinegenerated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting with human-written captions and image-mismatched captions, the caption generator effectively exploits the inherent characteristics of human languages, and generates more discriminative captions. We show that our proposed network is capable of producing accurate and diverse captions across images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

State-of-the-art approaches for image captioning require supervised training data consisting of captions with paired image data. These methods are typically unable to use unsupervised data such as textual data with no corresponding images, which is a much more abundant commodity. We here propose a novel way of using such textual data by artificially generating missing visual information. We eva...

متن کامل

Text-Guided Attention Model for Image Captioning

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns t...

متن کامل

The Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners

Vocabulary is one of the essential components of language and learning phrasal verbs as part of vocabulary is quite challenging for foreign language learners. The present study aimed at investigating the effects of visual and auditory input enhancement on learning non-congruent phrasal verbs. The participants of the study were 90 intermediate English language learners who were divided into two ...

متن کامل

The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning

This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...

متن کامل

Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation

With growing interest in adversarial machine learning, it is important for practitioners and users of machine learning to understand how their models may be attacked. We present a web-based visualization tool, ADVERSARIALPLAYGROUND, to demonstrate the efficacy of common adversarial methods against a convolutional neural network. ADVERSARIAL-PLAYGROUND provides users an efficient and effective e...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

نویسندگان

چکیده

منابع مشابه

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

Text-Guided Attention Model for Image Captioning

The Comparative Effect of Visual vs. Auditory Input Enhancement on Learning Non-Congruent Phrasal Verbs by Iranian EFL Learners

The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning

Adversarial-Playground: A Visualization Suite for Adversarial Sample Generation

عنوان ژورنال:

اشتراک گذاری